

Type 


Hits 


Search Text 


DBs 


Time Stamp 


1 


BRS 


15317 


707/1-7,9, 10. eels. 


US-PGPUB; 
US PAT 


2005/03/24 
09:24 


2 


BRS 


10251 


707/100-104. l.ccls. 


US-PGPUB; 
US PAT 


2005/03/24 
09:25 


3 


BRS 


2871 


707/200,203 .eels. 


US-PGPUB; 
US PAT 


2005/03/24 
09:25 


4 


BRS 


9980 


705/1, 14, 26,27. eels. 


US-PGPUB; 
US PAT 


2005/03/24 
09:26 


5 


BRS 


22939 


709/200,201-203,217-219,223- 
229 . eels . 


US-PGPUB; 
US PAT 


2005/03/24 
09:27 


6 


BRS 


8849 


715/500,501.1,513- 

517, 526, 700, 733, 738, 760, 764, 810 

, 848, 853, 854,866, 965 -968. ecls. 


US - PGPUB • 
US PAT 


2005/03 /24 
09:32 


7 


BRS 


5800 


713/200, 201. ecls. 


US-PGPUB; 
US PAT 


2005/03/24 
09:32 


8 


BRS 


1563 


345/418, 473. ecls. 


US-PGPUB; 
US PAT 


2005/03/24 
09:33 


9 


BRS 


23558 


SI or S2 or S3 


US-PGPUB; 
US PAT 


2005/03/30 
08:17 


10 


BRS 


46223 


S4 or S5 or S7 or S6 or S8 


US -PGPUB ; 
US PAT 


2005/03/30 
08:17 


11 


BRS 


15317 


707/1-7, 9, 10. ecls. 


US-PGPUB; 
US PAT 


2005/03/25 
08:18 


12 


BRS 


10251 


707/100-104. l.ccls. 


US-PGPUB; 
US PAT 


2005/03/25 
08:18 


13 


BRS 


2871 


707/200, 203. ecls. 


US-PGPUB; 
US PAT 


2005/03/25 
08:18 


14 


BRS 


23558 


Sll or S12 or S13 


US-PGPUB; 
US PAT 


2005/03/25 
08:18 


15 


BRS 


9980 


705/1, 14, 26, 27. eels. 


US-PGPUB; 
US PAT 


2005/03/25 
08:18 


16 




-S _J 


709/200,201-2 03,217-219,223- 
229 . eels . 


US-PGPUB; 
US PAT 


2005/03/25 
08:18 


17 


BRS 


8849 


715/500,501.1,513- 

517, 526, 700, 733, 738, 760, 764, 810 

, 84 8, 853, 854, 866, 965 -968. eels . 


US - PGPUB • 
US PAT 


2005/03/25 
08:18 


18 


BRS 


5800 


713/200, 201. eels. 


US-PGPUB; 
US PAT 


2005/03/25 
08 : 18 


19 


BRS 


1563 


345/418, 473. eels. 


US-PGPUB; 
US PAT 


2005/03/25 
08 : 18 


20 


BRS 


46223 


S15 or S16 or S18 or S17 or S19 


US-PGPUB; 
US PAT 


2005/03/25 
08:18 


21 


BRS 


2327 


(accept$4 with (query or 
queries) ) and input $4 


US-PGPUB; 
US PAT 


2005/03/25 
08:20 


22 


BRS 


33312 


search$4 and (data adj 
structure$2) 


US-PGPUB; 
US PAT 


2005/03/25 
08 :21 



3/30/05, EAST Version: 2.0.1.4 



23 


BRS 


1069391 


advert iser$2 and (web with 
page$2) abd information 


US-PGPUB; 
US PAT 


2005/03/25 
08 :22 



- 3/30/05, EAST Version: 2.0.1.4 





Type 


Hits 


Search Text 


DBS 


Time Stamp 


24 


BRS 


3087 


advertiser$2 and (web with 
page$2) and information 


US-PGPUB; 
US PAT 


2005/03/25 
08:22 


25 


BRS 


8834 


accept$4 and (search$4 with 
results) and generat$4 


US-PGPUB; 
US PAT 


2005/03/25 
08 :23 


26 


BRS 


1041 


retriev$4 with advertisement $2 


US-PGPUB; 
US PAT 


2005/03/25 
08 :24 


27 


BRS 


583 


inverted adj2 (file$2 or 
index$2 or indices) 


US-PGPUB; 
US PAT 


2005/03/25 
08:25 


28 


BRS 


30606 


world adj wide adj web 


US-PGPUB; 
US PAT 


2005/03/25 
08 :26 


29 


BRS 


308 


S21 and S22 and S25 


US-PGPUB; 
US PAT 


2005/03/25 
08:27 


30 


BRS 


1 


S21 and S22 and S25 and S24 and 
S26 


US-PGPUB; 
US PAT 


2005/03/25 
08 :27 


31 


BRS 


18 


S21 and S22 and S25 and S24 


US-PGPUB; 
US PAT 


2005/03/25 
08 :28 


32 


BRS . 


18 


S21 and S22 and S25 and S24 and 
retriev$4 


US-PGPUB; 
US PAT 


2005/03/25 
08:28 


33 


BRS 


15 


S21 and S22 and S25 and S24 and 
retriev$4 and S28 


US-PGPUB; 
US PAT 


2005/03/29 
13 :40 


34 


BRS 


2330 


(accept$4 with (query or 
queries) ) and input $4 


US-PGPUB; 
US PAT 


2005/03/29 
13 :40 


35 


BRS 


33347 


search$4 and (data adj 
structure$2) 


US-PGPUB; 
US PAT 


2005/03/29 
13 :40 


36 


BRS 


8851 


accept $4 and (search$4 with 
results) and generat$4 


US-PGPUB; 
US PAT 


2005/03/29 
13 :40 


37 


BRS 


30648 


world adj wide adj web 


US-PGPUB; 
US PAT 


2005/03/29 
13 :40 


38 


BRS 


136 


S34 and S3 5 and S3 6 and 
retriev$4 and S3 7 


US-PGPUB; 
US PAT 


2005/03/29 
13:41 


39 


BRS 


3 


S34 and S3 5 and S3 6 and 
retriev$4 and S3 7 and (inverted 
with (index$2 or indices) ) and 
(term$2 with count $4) 


US-PGPUB; 
US PAT 


2005/03/29 
13 :42 


40 


BRS 


14 


S34 and S3 5 and S3 6 and 
retriev$4 and S3 7 and (inverted 
with (index$2 or indices) ) 


US-PGPUB; 
U9PAT 


2005/03/29 
1 • R4 


41 


BRS 


364 


( (ad$2 or advert isment $2) with 
perform$6) and ( (ad$2 or 
advert isment $2) with price$2) 


US-PGPUB; 
US PAT 


2005/03/29 
13:56 


42 


BRS 


10 


S34 and S36 and ( (ad$2 or 
advert isment $2) with perform$6) 
and ( (ad$2 or advert isment $2 ) 
with price$2) 


US-PGPUB; 
USPAT 


2005/03/29 
13 :56 


43 


BRS 


15346 


707/1-7, 9, 10. eels. 


US-PGPUB; 
USPAT 


2005/03/30 
08:17 
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44 


BRS 


10278 


707/100-104. l.CCls. 


US-PGPUB; 
US PAT 


2005/03/30 
08:17 
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Type 


Hits 


Search Text 


DBs 


Time Stamp 


45 


BRS 


2879 


707/200,203 .CCls. 


US -PGPUB; 
US PAT 


2005/03/30 
08 :17 


46 


BRS 


23605 


S43 or S44 or S45 


US -PGPUB ; 
US PAT 


2005/03/30 
08:17 


47 


BRS 


9995 


705/1, 14,26,27. CCls. 


US - PGPUB ; 
US PAT 


2005/03/30 
08 :17 


48 


BRS 

LJ1\. kJ 


22982 


70 9/200,201-203, 217-219, 223- 
229 .ecls. 


US -PGPUB ; 
US PAT 


2005/03/30 
08 :17 


49 


BRS 


8864 


715/500,501.1,513- 

517, 526, 700, 733, 738, 760, 764, 810 

,848, 853, 854, 866, 965 -968. CCls. 


US-PGPTTR • 
US PAT 


\J \J J W«J / -J \J 

08 :17 


50 


BRS 


5812 


713/200, 201. CCls. 


US - PGPUB ; 
US PAT 


2005/03/30 
08 : 17 


51 


BRS 


1567 


345/418, 473. CCls. 


US -PGPUB ; 
US PAT 


2005/03/30 
08 :17 


52 


BRS 


46302 


S47 or S48 or S50 'or S49 or S51 


US -PGPUB ; 
US PAT 


2005/03/30 
08:17 
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?ds 

f 

Set Items Description 

51 73301 ADVERTIS? OR ADVERTIZ? OR AD OR ADS OR PROMOTION? OR ADVER- 

T? ? OR COMMERCIAL () MESSAGE? 

52 1181 (INVERT?) (2N) (INDEX? OR INDICES OR FILE OR FILES OR FILI- 

NG? OR LIST OR LISTS OR LISTING? OR STRUCTURE?) 

53 21055 (INTERNET OR WEB OR ONLINE OR ON () LINE OR HOME) (2N) (PAGE OR 

PAGES OR SITE OR SITES OR PORTAL? OR DIRECTOR?) 

54 34902 (E OR ELECTRONIC OR DIGITAL OR VIRTUAL) (1W) (MAIL??? OR M- 

ESSAG??? OR CORRESPOND?) OR EMAIL???? OR (INTERNET OR ON () LI- 
NE OR ONLINE OR WEB) (1W) MAIL???? OR MIME OR SMTP OR POP (IN) - 
MAIL 

55 3 SI AND S2 

56 1 S2 AND (S3 OR S4) 

57 17 S2 AND (INTERNET OR WEB OR WWW OR ONLINE OR ON () LINE) 

58 7 S7 AND IC=G06F? 
?show files 
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'?ds 



Set 


Items 


Description 


SI 


73140 


ADVERTIS? OR ADVERTIZ? OR AD OR ADS OR PROMOTION? OR ADVER- 




T? 


? OR COMMERCIAL () MESSAGE? 


S2 


1938 


(INVERT? OR FORWARD) (2N) (INDEX? ? OR INDICES OR FILE OR F- 




ILES OR FILING? OR LIST OR LISTS OR LISTING? OR STRUCTURE?) 


S3 


20991 


( INTERNET OR WEB OR ONLINE OR ON () LINE OR HOME) (2N) (PAGE OR 




PAGES OR SITE OR SITES OR PORTAL? OR DIRECTOR?) 


S4 


29975 


(E OR ELECTRONIC) (1W) (MAIL??? OR MESSAG??? OR CORRESPOND- 




?) 


OR EMAIL???? OR (INTERNET OR ON () LINE OR ONLINE OR WEB) (- 




1W) 


MAIL???? 


S5 


5 


SI AND S2 


S6 


10 


S3:S4 AND S2 


S7 


42 


S2 AND (WEB OR INTERNET OR WWW OR ONLINE OR ON () LINE) 


S8 


1 


S7 AND SI 


S9 


1167 


INVERT? (2N) (INDEX? ? OR INDICES OR FILE OR FILES OR FILIN- 




G? 


OR LIST OR LISTS OR LISTING? OR STRUCTURE?) 


S10 


17 


S9 AND (WEB OR INTERNET OR WWW OR ONLINE OR ON () LINE) 


Sll 


7 


S10 AND IC=G06F? 


?show 


files 
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6/5/7 (Item 7 from file: 350) 

DIALOG (R) File 350:Derwent WPIX 

(c) 2004 Thomson Derwent . All rts. reserv, 

014004914 **Image available** 

WPI Acc No: 2001-489128/200154 

XRPX Acc No: N01-361923 

Method for storing information about web documents such as pages or sites 
in a manner that may be used in conjunction with inverted term lists 
to facilitate the retrieval of documents of interest from the web 

Patent Assignee: GTE LAB INC (SYLV ); VERIZON LAB INC (VERI-N) 

Inventor: PONTE J M 

Number of Countries: 002 Number of Patents: 002 
Patent Family: 

Patent No Kind Date Applicat No Kind Date . Week 

CA 2310931 Al 20010130 CA 2310931 A 20000607 200154 B 

US 6665665 Bl 20031216 US 99365326 A 19990730 200382 

Priority Applications (No Type Date) : US 99365326 A 19990730 
Patent Details : 

Patent No Kind Lan Pg Main IPC Filing Notes 
CA 2310931 Al E 105 G06F-017/30 
US 6665665 Bl G06F-017/30 

Abstract (Basic) : CA 2310931 Al 

NOVELTY - The method for storing information about Web documents 
such as pages or sites in a manner that may be used in conjunction with 

inverted term lists to facilitate the retrieval of documents of 
interest from the Web. The method involves constructing compressed 
surrogates for each document in the database and inserting in the 
compressed documents surrogate information about terms which occur in 
the document, such that various operations may be performed without the 
need to retrieve a copy of the document from the Web. Inverted term 
lists that contain information about terms that occur in the database 
are also created in conjunction with creation of the compressed 
document surrogates . 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are included for: (i) a 
device for maintaining information about a collection of documents in a 
data base to facilitate determining which documents may be of interest; 

(ii) a device for modifying a collection of inverted term lists ; 

(iii) a device for determining the score for a document under a search 
query which specifies terms that are desired to be present or absent; 

(iv) a device for returning a list of a desired number of documents N 
in order of predicted utility, from among a collection of documents, as 
predicted by a search query containing terms desired to be present or 
absent . 

USE - For maintaining information about material on the World Wide 
Web to facilitate retrieval of web pages of interest to a user that 
relate to electronic commerce. 

ADVANTAGE - Permits efficient updating of inverted term lists 
when documents on the Web have been modified or deleted, and also 
permits the efficient processing of search queries in a variety of 
circumstances . 

DESCRIPTION OF DRAWING (S) - The drawing shows a schematic diagram 
of the computer system, 
computer system (1) 
pp; 105 DwgNo 1/14 
Title Terms: METHOD; STORAGE; INFORMATION; WEB; DOCUMENT; PAGE; SITE; 

MANNER; CONJUNCTION; INVERT; TERM; LIST; FACILITATE; RETRIEVAL; DOCUMENT; 
INTEREST; WEB 
Derwent Class: T01; W01 

International Patent Class (Main) : G06F-017/30 

International Patent Class (Additional): G06F-007/00; H04L-012/24 
File Segment: EPI 
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11/5/1 (Item 1 from file: 347) 

DIALOG (R) File 347: JAPIO 

(c) 2004 JPO & JAPIO. All rts. reserv. 

06103734 **Image available** 

WEB DOCUMENT RETRIEVAL SUPPORTING DEVICE AND COMPUTER READABLE RECORDING 
MEDIUM RECORDED WITH PROGRAM FOR FUNCTIONING COMPUTER AS THE DEVICE 



PUB. NO. : 11-045257 [JP 11045257 A] 

PUBLISHED: February 16, 1999 (19990216) 

INVENTOR (s): . WAKASUGI TAKASHI 

APPLICANT (s) : JUST SYST CORP 

APPL. NO. : 09-199618 [JP 97199618] 

FILED: July 25, 1997 (19970725) 

INTL CLASS: G06F-017/30 



ABSTRACT 



PROBLEM TO BE SOLVED: To automatically classify gathered Web documents 
into prepared respective categories and to reduce labor required for the 
classifying work of the Web documents. 

SOLUTION: This device is provided with a Web document gathering software 
104 for gathering the Web documents, an inverted file 109 for storing 
retrieval information used for retrieving the gathered Web documents, a 
category management software 105 for inputting retrieval conditions, 
setting the inputted retrieval conditions as classification items and 
presenting the classification items corresponding to a request from a Web 
client and a retrieval software 106 for retrieving the pertinent Web 
document by using the retrieval information stored in the inverted file 
109 based on the selected classification item when the classification item 
is selected in the Web client. In this 'case, the category management 
software 105 presents the list of the pertinent Web documents to the Web 
client based on the retrieved result of the retrieval software 106. 

COPYRIGHT: (C) 1999, JPO 



11/5/2 (Item 2 from file: 347) 

DIALOG (R) File 347: JAPIO 

(c) 2004 JPO & JAPIO. All rts. reserv. 

03060071 **Image available** 
ON - LINE INFORMATION RETRIEVAL SUPPORTING DEVICE 



PUB. NO. : 
PUBLISHED: 
INVENTOR (s) : 
APPLICANT (s) 

APPL. NO. : 
FILED: 
INTL CLASS: 
JAPIO CLASS: 

JOURNAL : 



02-035571 [JP 2035571 A] 
February 06, 1990 (19900206) 
MORITA TETSUYA 

RICOH CO LTD [000674] (A Japanese Company or Corporation), JP 
(Japan) 

63-184477 [JP 88184477] 
July 26, 1988 (19880726) 
[5] G06F-015/40 ; G06F-012/00 

4 5.4 (INFORMATION PROCESSING — Computer Applications); 4 5.2 
(INFORMATION PROCESSING Memory Units) 

Section: P, Section No. 1037, Vol. 14, No. 189, Pg. 161, 
April 17, 1990 (19900417) 



ABSTRACT 

PURPOSE: To efficiently execute high-speed retrieval by down loading 
information from an on - line data base after limiting a retrieval object 
file to proper quantity by a private file. 

CONSTITUTION: For the side of the terminal of a user, a private data base 
20 having an inherent keyword file 17 peculiar to the user, a covalent 
keyward file 18, and an inherent inverted file 19 to indicate the 
corresponding relations between an inherent keyword and the on - line 



data base is provided. When the on - line data base is to be retrieved, 
fir'st, the retrieval object file is limited to the proper quantity by the 
private data base 20 with a command and a data format standardized for the 
user, and after that, the information is down loaded from the on - line 

data base with the standardized command for the user. 



11/5/3 (Item 1 from file: 350) 

DIALOG (R) File 350:Derwent WPIX 

(c) 2004 Thomson Derwent . All rts. reserv. 

016606405 **Image available** 

WPI Acc No: 2004-765139/200475 

XRPX Acc No: N04-603601 

Inverted index storing method for internet searching, involves 
storing index information related to same index item in continuous blocks 
and using index units in each index block for storing index information 
related to same index item 

Patent Assignee: INT BUSINESS MACHINES CORP (IBMC ) 

Inventor: PAN Y; SU Z; YANG LP 

Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 20040205044 Al 20041014 US 2004818833 A 20040406 200475 B 

Priority Applications (No Type Date) : CN 2003109847 A 20030411 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
US 20040205044 Al 18 G06F-017/30 

Abstract (Basic) : US 20040205044 Al 

NOVELTY - The index information related to each index item is 
sequentially stored in the newly created inverted file , such that 
the index information related to the same index item is stored in 
continuous blocks, and the index units in each index block are only 
used for storing the index information related to the same index item. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following : 

(1) program product for storing inverted index ; and 

(2) inverted index mechanism. 

USE - For fill-text retrieval in internet search. 

ADVANTAGE - No need to relocate the reading pointer to the file, 
when reading the index information on an arbitrarily chosen index item, 
thus reducing the file reading time. When performing an operation on 
the index information in an index block, other index items are not 
affected, thus it is possible to on - line update the index 
information in any index block through a simple locking-unlocking 
method, without having to stop searching service. 

DESCRIPTION OF DRAWING (S) - The figure shows flowcharts of method 
for storing inverted index . 

pp; 18 DwgNo 1C/8 

Title Terms: INVERT; INDEX; STORAGE; METHOD; SEARCH; STORAGE; INDEX; 

INFORMATION; RELATED; INDEX; ITEM; CONTINUOUS; BLOCK; INDEX; UNIT; INDEX; 

BLOCK; STORAGE; INDEX; INFORMATION; RELATED; INDEX; ITEM 
Derwent Class: T01 

International Patent Class (Main) : G06F-017/30 
File Segment: EPI 
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015460527 **Image available** 
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XRPX Acc No: N03-414759 

Internet based data record search method in database, involves 
constructing query corresponding to given search criteria and executing 



it on identified regions of database 

Patent Assignee: MICROSOFT CORP (MICT ) 
Inventor: AGRAWAL S; CHAUDHURI S 

Number of Countries: 001 Number of Patents: 002 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 20030078915 Al 20030424 US 200136348 A 20011019 200349 B 

US 6792414 B2 20040914 US 200136348 A 20011019 200460 

Priority Applications (No Type Date) : US 200136348 A 20011019 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
US 20030078915 Al 26 G06F-007/00 
US 6792414 B2 G06F-017/30 

Abstract (Basic) : US 20030078915 Al 

NOVELTY - An inverter list of keywords that maps the data 
record components to a region of database containing corresponding data 
record, is created. The regions of database containing data records 
relating to the given search keyword, are identified by accessing the 
inverted list . A query is constructed corresponding to the given 
search criteria and is executed on the identified regions of database. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following : 

(1) computer readable medium storing instructions to perform data 
record search process; and 

(2) data records search apparatus. 

USE - For searching data records in database comprising address 
information of employee, mailing list information, product and sales 
details . 

ADVANTAGE - The records matching the search criteria are 
efficiently retrieved by executing the query on the identified regions. 
The keyword searching on relational database is made efficient. 

DESCRIPTION OF DRAWING (S) - The figure shows the flowchart of data 
record search process. 

pp; 26 DwgNo 6/24 

Title Terms: BASED; DATA; RECORD; SEARCH; METHOD; DATABASE; CONSTRUCTION; 

QUERY; CORRESPOND; SEARCH; CRITERIA; EXECUTE; IDENTIFY; REGION; DATABASE 
Derwent Class: T01 

International Patent Class (Main) : G06F-007/00 ; G06F-017/30 
File Segment: EPI 
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WPI Acc No: 2003-429970/200340 

XRPX Acc No: N03-343391 

Interactive multimedia delivery system used in networking system, has 
retriever that exploits specific statistics maintained by inverted 
indices , to rank relevance of nodes for new annotation set 

Patent Assignee: KNUMI INC (KNUM-N) 

Inventor: DEY J K; SIVASANKARAN R M 

Number of Countries: 001 Number of Patents: 001 

Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 20030061028 Al 20030327 US 2001956889 A 20010921 200340 B 

Priority Applications (No Type Date) : US 2001956889 A 20010921 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
US 20030061028 Al 17 G06F-017/27 

Abstract (Basic) : US 20030061028 Al 

NOVELTY - The system derives data from the previous mappings of 
annotations to nodes in an ontology. A retriever exploits specific 



statistics maintained by inverted indices , to rank the relevance of 
the nodes for a new annotation set . The information related to new 
annotation set. The information related to new annotation, is extracted 
from the most relevant node and database through network. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following : 

(1) ontology searching method; 

(2) contextual information retrieval method; 

(3) contextual information retrieval system; 

(4) annotation mapping method; and 

(5) article of manufacture comprises computer medium with computer 
readable program code for searching ontology. 

USE - Interactive multimedia delivery system implemented on 
multi-nodal system e.g. LAN or networking system e.g. Internet , world 
wide web ( WWW ) , wireless web , for delivering annotations through 
television, computer handheld device or telephone. 

ADVANTAGE - The multimedia authoring environment enables broadband 
producer to rapidly create a document that integrates multimedia 
content with other content that is relevant to the multimedia segment. 

DESCRIPTION OF DRAWING (S) - The figure shows the ways for obtaining 
various multimedia document annotations. 

pp; 17 DwgNo 8/8 

Title Terms: INTERACT; DELIVER; SYSTEM; SYSTEM; RETRIEVAL; EXPLOIT; 

SPECIFIC; STATISTICAL; MAINTAIN; INVERT; INDEX; RANK; RELEVANT; NODE; NEW 
; SET 

Derwent Class: T01; W02 

International Patent Class (Main) : G06F-017/27 
File Segment: EPI 
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Method for storing information about web documents such as pages or 
sites in a manner that may be used in conjunction with inverted term 
lists to facilitate the retrieval of documents of interest from the web 



Patent Assignee: GTE LAB INC (SYLV ); VERIZON LAB INC (VERI-N) 
Inventor: PONTE J M 

Number of Countries: 002 Number of Patents: 002 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

CA 2310931 Al 20010130 CA 2310931 A 20000607 200154 B 

US 6665665 Bl 20031216 US 99365326 A 19990730 200382 



Priority Applications (No Type Date) : US 99365326 A 19990730 
Patent Details : 

Patent No Kind Lan Pg Main IPC Filing Notes 
CA 2310931 Al E 105 G06F-017/30 
US 6665665 Bl G06F-017/30 
Abstract (Basic) : CA 2310931 Al 

NOVELTY - The method for storing information about Web documents 
such as pages or sites in a manner that may be used in conjunction with 

inverted term lists to facilitate the retrieval of documents of 
interest from the Web . The method involves constructing compressed 
surrogates for each document in the database and inserting in the 
compressed documents surrogate information about terms which occur in 
the document, such that various operations may be performed without the 
need to retrieve a copy of the document from the Web . Inverted term 
lists that contain information about terms that occur in the database 
are also created in conjunction with creation of the compressed 
document surrogates. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are included for: (i) a 
device for maintaining information about a collection of documents in a 



data base to facilitate determining which documents may be of interest; 

(ii) a device for modifying a collection of inverted term lists ; 

(iii) a device for determining the score for a document under a search 
query which specifies terms that are desired to be present or absent; 

(iv) a device for returning a list of a desired number of documents N 
in order of predicted utility, from among a collection of documents, as 
predicted by a search query containing terms desired to be present or 
absent . 

USE - For maintaining information about material on the World Wide 
Web to facilitate retrieval of web pages of interest to a user that 
relate to electronic commerce. 

ADVANTAGE - Permits efficient updating of inverted term lists 
when documents on the Web have been modified or deleted, and also 
permits the efficient processing of search queries in a variety of 
circumstances . 

DESCRIPTION OF DRAWING (S) - The drawing shows a schematic diagram 
of the computer system, 
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The database access system provides query searches into very large 
full-text document databases, e.g. Internet search agents. The search 
system uses an inverted index database containing document and term 
frequency data. A static cache is created from the inverted index . 
This contains entries in contribution order for inverted index 
entries with large numbers of associated documents. 

Each term of a query is initially searched in the static cache and 
term contributions combined. An additional look-up table is included in 
the cache to aid in accessing the inverted index database. 

ADVANTAGE - Provides more rapid access and query evaluation of 
search database than directly evaluating inverted index . 
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Abstract: We identify crucial design issues in building a distributed 
inverted index for a large collection of Web . pages . We introduce a 
novel pipelining technique for structuring the core index-building system 
that substantially reduces the index construction time. We also propose a 
storage scheme for creating and managing inverted files using an 
embedded database system. We suggest and compare different strategies for 
collecting global statistics from distributed inverted indexes 
Finally, we present performance results from experiments on a testbed 
distributed Web indexing system that... 
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the librarian. The various Web search sites make use of "traditional" 
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. . . can be browsed or searched to more accurately and quickly meet 
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Abstract: Over the last few years, most major search engines have 
integrated link-based ranking techniques in order to provide more accurate 
search results. One widely known approach is the Pagerank technique, which 
'forms the basis of the Google ranking scheme, and which assigns a global 
importance measure to each page based on the importance of other pages 
pointing to it. The main advantage of the Pagerank measure is that it is 
independent of the query posed by a user; this means that it can be 
precomputed and then used to optimize the layout of the inverted index 
structure accordingly. However, computing the Pagerank measure requires 
implementing an iterative process on a massive graph corresponding to 
billions of web pages and hyperlinks. In this paper, we study 
I/O-eff icient techniques to perform this iterative computation. We derive 
two algorithms for Pagerank based on techniques proposed for out-of-core 
graph algorithms, and compare them to two existing algorithms proposed by 
Haveliwala. We also consider the implementation of a recently proposed 
topic-sensitive version of Pagerank. Our experimental results show that 
for very large data sets, significant improvements over previous results 
can be achieved on machines with moderate amounts of memory. On the * other 
hand, at most minor improvements are possible on data sets that are only 
moderately larger than memory, which is the case in many practical 
scenarios. 43 Refs. 

Descriptors: *World Wide Web; Search engines; Iterative methods; 
Algorithms; Graph theory; Optimization 

Identifiers: Pagerank technique; External memory algorithms 
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Human maintained search engines are expensive, slow to update, and 
cannot cover all the web pages . Automated search engines that rely on 
keyword matching usually return too many low quality results, with most 
users only looking at the first few tens of the search results. Because 
search engine development has gone on at companies with little publication 
of technical details, it is a challenging task to develop a search engine. 
The use of hypertextual information can help to improve search quality. 
This report addresses the question of how to build an inverted index 
for a search system that can use the additional information presented in 
hypertext to produce better search results. This report is part of the work 
of the Concordia INdexing and Discovery (CINDI) Digital Library System. In 
this report, we summarize the research work I have done; we present some 
implementation issues for the project; and present the data structures that 
can be used in indexing web pages . The design decision was driven by 
the desire to have a reasonable compact data structure, and the ability to 
fetch a record in few disk seeks during a search. This project has been 
implemented in C++ on Linux platform. 
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Abstract: At present time most explorers are created as simple inverted 
index and system ranking, which gives the largest speed and efficiency 
of explorer. This article is the comparison and classification of modern 
systems for indexing internet sites , classification, building and 

proposed solutions. (3 Refs) 
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Conference Date: 23-26 Sept. 2001 Conference Location: 

Santorini/Thera, Greece 

Language: English Document Type: Conference Paper (PA) 

Treatment: Practical (P); Experimental (X) 

Abstract: We present a parallel vector space based text retrieval 
prototype implemented on a low-cost PC cluster running Linux operating 
system, using the PVM message passing library. We also embed the inverted 
file structure into our proposed prototype for fast retrieval. From 

several experiments derived from the standard TREC-9 collection, this 
prototype can index up to 500000 Web pages per hour using a simple x86 
machine. We also obtain 5.4 seconds query response time on searching in the 
one and a half million TREC-9 Web pages , using two machines. (15 Refs) 
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Abstract: We identify crucial design issues in building a distributed 
inverted index for a large collection of Web pages . We introduce a 
novel pipelining technique for structuring the core index-building system 
that substantially reduces the index construction time. We also propose a 
storage scheme for creating and managing inverted files using an 
embedded database system. We suggest and compare different strategies for 
collecting global statistics from distributed inverted indexes. Finally, we 
present performance results from experiments on a testbed distributed Web 
indexing system that we have implemented. (36 Refs) 
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Abstract: The active guidebook is a context-aware information management 
system that uses a combination of spatial and keyword indexing to retrieve 
data. The system has three principal components: a new document description 
language extends HTML to include facilities for tagging with spatial 
locations. Retrieval uses two separate indexes - a segment tree is used for 
spatial indexing and an inverted file is used for keyword indexing. A 
user interface allows queries involving keywords and location data to be 
expressed, and presents their results. The system has been evaluated with 
the implementation of an interactive guidebook. The test data was drawn 
from existing Web pages describing the city of Cambridge in England, 
which were augmented with spatial information. A GPS system is used to 
provide the default location information for retrieval, but can be 
overridden with explicit coordinates. (10 Refs) 
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Abstract: Information retrieval systems can be partitioned into two main 
classes: large-scale systems that make use of an inverted index or some 
other auxiliary data structure, intended for massive volumes of data; and 
the small-scale systems based upon sequential pattern matching that most 
computer users employ when hunting for missing email and news items. In 
this paper we describe a hybrid approach that offers the ranked queries and 
similarity matching of a genuine information retrieval system, but does so 
without any need for an index to be precomputed. This software tool, which 
we call seft, offers performance that in a retrieval effectiveness sense 
matches conventional information retrieval systems, and in a resource 
efficiency sense, while considerably slower than grep-like tools, is fast 
enough to be useful on hundreds of megabytes of text. (19 Refs) 
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Publisher: IEEE Comput. Soc, Los Alamitos, CA, USA 
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Treatment: Experimental (X) 

Abstract: Many researchers have proposed classification systems that 
automatically classify email in order to reduce information overload. 
However, none of these systems are in use today. This paper examines some 
of the problems with classification technologies and proposes Relevance 
Categories as a method to avoid some of these problems. In particular, the 
dynamic nature of email categories, the cognitive overhead, required 
training categories, and the high costs of classification errors are 
hurdles for many classification algorithms. Relevance Categories avoid some 
of these problems through their simplicity; they are merely 
relevance-ranked lists of email messages that are similar to a set of 
query messages, by displaying messages as the result of a dynamic query in 



lieu of fixed categories, we hypothesize that users will be less sensitive 
to errors using the Relevance Categories scheme than to errors using a 
fixed categorization scheme. To study the effectiveness of the Relevance 
Categories concept, we devised a performance metric for relevance ranking 
and used it to test an inverted index implementation on the 

Reuter-21578 test collection. The promising test results indicate the need 
for further work. (9 Refs) 
Subfile: C 
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Abstract: Much effort in IS is going into creating data warehouses. These 
are stores of data periodically extracted from older legacy applications, 
converted to common standards and made accessible for user analysis. The 
warehouse acts as a WORM (Write Once, Read Many times) storage. Where the 
extract and transfer is performed nightly, they provide access to what is 
termed "near operational" data and can be used to replace much of the 
existing reporting. In other cases they are used to store mostly historical 
data for analysis of trends, market impact, financial status and so on. 
While often implemented with a variety of different clean up tools, 
languages, database products and query tools, this article describes an 
implementation done almost entirely with APL. It includes a query 
capability termed "query by mail" which enables anyone with access to e - 
mail to send queries to the warehouse and receive responses or extracts 
of data by return mail. The "query" includes customized analysis of field 
content to allow identification of fields and records containing invalid 
data. Built upon a proprietary inverted file system, it provides rapid 
response to user queries and little load on the server system. (0 Refs) 
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Abstract: Full text retrieval (FTR) technology has evolved very 
significantly during the late 1980s. Organisations are incorporating an FTR 
system into their corporate IT strategy, and integrating it with word 
processing, electronic mail , micro-mainframe links, integrated office 
systems and document image processing systems. The paper discusses the 
future requirements of FTR systems covering aspects such as recall and 
precision, inverted file indexing, optical character recognition (OCR) 
technology, multi-lingual capabilities, and automatic hypertext generator 
systems. (8 Refs) 
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Categorisation is a useful method for organising documents into 
subcollections that can be browsed or searched to more accurately and 
quickly meet information needs. On the Web , category-based portals such 
as Yahoo! and DMOZ are extremely popular: DMOZ is maintained by over 56,000 
volunteers, is used as the basis of the popular Google directory, and is 
perhaps used by millions of users each day. Support Vector Machines (SVM) 
is a machine-learning algorithm which has been shown to be highly effective 
for automatic text categorisation. However, a problem with iterative 
training techniques such as SVM is that during their learning or training 
phase, they require the entire training collection to be held in 



main-memory; this is infeasible for large training collections such as DMOZ 
or large news wire feeds. In this paper, we show how inverted indexes can 
be used for scalable training in categorisation, and propose novel 
heuristics for a fast, accurate, and memory efficient approach. Our results 
show that an index can be constructed on a desktop workstation with little 
effect on categorisation accuracy compared to a memory-based approach. We 
conclude that our techniques permit automatic categorisation using very 
large training collections, vocabularies, and numbers of categories. 
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Construction; Performance evaluation 
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The need of efficient tools in order to manage, retrieve and filter the 
information in the WWW is clear. Web directories are taxonomies for the 
classification of Web documents. These kind of information retrieval 
systems present a specific type of search where the document collection is 
restricted to one area of the category graph. This paper introduces a 
specific data architecture for Web directories that improves the 

performance of restricted searches. That architecture is based on a hybrid 
data structure composed of an inverted file with multiple embedded 
signature files. Two variants are presented: hybrid architecture with total 
information and with partial information. This architecture has been 
analyzed by means of developing both variants to be compared with a basic 
model. The performance of the restricted queries was clearly improved, 
especially the hybrid model with partial information, which yielded a 
positive response under any load of the search system. 
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This paper describes the design and implementation ofa system for 
computer generation oflinked HTML documents to support information 
retrieval and hypertext applications on the World Wide Web. The approach is 
based on work by Salton and others, but extends the concept to be 
compatible with the World Wide Web browser environment by adding an 
interactive indexing technique that is well suited to the mouse-based 
point-and-shoot input common to windowed browsers. The system does not 
require text query input, nor any client or host processing other than 
hypertext linkage. The goal of this work is to construct a fully automatic 
system in which original text documents are read, and processed by a 
computer program that generates HTML files, which can be used immediately 
by Web browsers to search and retrieve the original documents. Thus, a user 
with a large collection of inf ormation-f or instance, newspaper articles-can 
feed these documents to the program described here andproduce directly, 
without further human intervention, the necessary files to establish World 
Wide Web home and related pages , to support interactive retrieval and 
distribution of the original documents. 
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... by providing, a service (including but not limited to finding 

information) . Services provided might include advertising opportunities, 
a research service (e.g., Find/SVP, or travel services. * Technology: 
Products that are. . . 

...searchers. * Intranet Development: Products that assist in the 
development or maintenance of an Intranet or Internet Web site , such 
as InmagicB/Text Webserver. 

These are loose categorizations only; several products cover more 
than. . . 

. . . you need . 

OK, here goes .... 
A Business Compass 

Does the world really need a new online Web directory ? Probably 
not, but it certainly could use a better one or two. Quite possibly, A. . . 

...Business Compass LLB executive vice president, he took the service live 
in mid-December. This Internet directory and analysis service 
pre-screens and profiles leading Web-based sources of business information 
and. . .www. adobe, com/) . 

Adobe Pagemill 2.0 for the Macintosh, which enhances and speeds up 
Web page design, and Adobe's Photoshop 4.0 for Windows and Macintosh 
platforms, which supports digital... 

. . .may select an interface language from 18 different languages, and the 
browser will accurately display Web pages authored in any of over 90 
world languages. Arabic and Hebrew characters display properly in... 

. . .and multimedia content. 

The Tango 2.5 browser is available for downloading from the Alls Web 
site (http://www.alis.com). They let you "test-drive" it free for up to 
30. . . 

...0, which will integrate Tango Mail, a feature allowing easy, automatic 
creation and viewing of e - mail in dozens of languages. 

These and other communication and translation services and tools are 
part . . . 

. . . surcharge . 

Citizen 1 Software, Inc. 

Continuing the trend toward making the retrieval of information from 
Web sites at least as easy and fast as locating information from 
print-based resources, Citizen 1... 

...annual cost-per-seat basis. 
Citysurf 

Virtual Media Services, of Tulsa, Oklahoma, demonstrated and sought 
advertisers for a nationwide business Net directory. Citysurf users can 
find a business in over YO...the national Bigbook, Big Yellow, US WEST, and 
Athand directories. They have especially targeted local advertising and 
local sites. Citysurf offers advertisers the opportunity to buy banner 
ads or to be listed in the "Top 10" of a specific city in their business 



...workflow, software to colleges and universities, hyperlinks to more than 
2,000 college and university home pages . 
Direcpc 

DirecPC, from Hughes Network Services, is the U.S. domestic 
alternative for satellite Internet... 

...stores began retailing DirecPC in California in October, and if you go 
to DirectPC's Web site (http://www. direcpc.com), you can get a listing 
of retail suppliers in your area... 

...I'll continue to hope. 
Dun & Bradstreet 

DAB and Lycos, Inc. announced the co-branded Web site 
CompaniesOnline (http: //www. companiesonline . com) , a free 
business-to-business directory featuring detailed information on. . . 

...search criteria; 2) a detailed company information page; and 3) a page 
with company-supplied advertising information. The site includes banner 
advertising . 

D&B brings company-specific information from its database of more 
than 40 million companies... 

. . .which contains detailed company data supplied by DAB, users can link to 
the company's Web site or a page containing advertising information 
supplied by the company. Advertisers pay an annual subscription fee for 
inclusion. Business users can also obtain company information by linking 
directly to D&B's Web site and purchasing a $20 Business Background 
Report on the company by using a credit card. . . 

...the business name and location (city and state); the company information 
page includes mailing address, e - mail address, D&B D-U-N-S Number, 
telephone number, trade-style name, company size... 

...legal status, parent company name, contact name and title, and company 
URL. Information on the advertising page varies by company and requires 
registration with Lycos. 

D&B also demonstrated its Marketing. . . 

...Excite, Hotbot, Infoseek, Infoseek Ultra, Lycos, Magellan, Open Text 
Index, Webcrawler, and Yahoo! . Reports are e - mailed to requesters and 
can be kept on file as an indication of the status of... Family & 
Relationships, and Health. Unlike typical Web search engines, FINDOUT's 
libraries direct users to sites on commercial online services and 
offline articles, books, organizations, guides, videos, software, and 
CD-ROMS. 

Find/SVP is. . . 

. . .discussion with prospective sponsors and with leading membership and 
subscription-based service organizations, ISPS, and Web sites 
interested in providing on-demand answers as a value-added customer 
benefit. Find/SVP is... 

. . . FINDOUT button" on their own sites as a service promoting, brand loyalty 
and repeat visits. Advertising and sponsorship opportunities and 800- or 
900-number telephone access options are also in the. . . 

...Forefront is in the forefront. Their Webwhacker offline browsing product 
created the standard for downloading Web sites to a user's hard drive. 
Webseeker, now in an enhanced version 2.2, runs... 

...offline. Keyword highlighting enables users to quickly locate searched 
words within the text of the Web pages opened. The enhanced version 
includes a scheduling interface and allows the setting of download control 
limits . 

At COMDEX, Forefront announced Webprinter 2.0, an application that 
allows users to turn Web pages into attractive, double-sided, hardcopy 
booklets. Webprinter intercepts Web pages on the way to the printer, 
automatically reducing, rotating, and realigning them to print as... 



. . •* CompuServe, America Online, Prodigy, Mosaic, and other browsers. Trial 
versions are available at the Forefront Web site Chttp : //www . f f g . com) . 
Webprinter is based on technology found in Forefront's more powerful... 

...navigate, and organize into a seamless interface. 
GRIT 

If you have tired of reading about Internet sites , you can hear 
about them instead on GRIT (Gould Resources & Internet Telecommunications), 
"the world's. 

...day, seven days a week, via RealAudio and Streamworks, at 
http://www.grit.com. The Web site features live talk about sports, 
exercise, music, technology, politics, etc., and also reviews Web sites 
. It also contains a search engine, a digital photograph gallery, and a 
link to GRIT's CU-SeeMe reflector site. Listeners may visit other Web 
sites while the shows play in the background. 
HDS Network Systems 

At Internet World, I had... the Fly Conference. The software makes it 
easy for users to post messages on a Web site in an organized manner, 
in either public or private modes. Conference supports private conferences 
with. . . 

...intelligently crawl more than 10 million Web documents per day and is 
said to index Web site content up to three times faster than any other 
technology available. HotBot received the PC... 

...the formation of the NewsPage Network (NPN) , which makes daily 
customized news available to other Web sites . Designed as a traffic 
builder for specialized content sites and a wider distribution area for... 

...future than they do now. You can already see traces of NewsPage on the 
following Web sites : MSNBC, Quicken Financial Network, InfoSeek, Achoo 
Online Healthcare, Kleiner Perkins Caulfield k Byers, All Things... 

...bed and breakfast in Northern California, the new Ultrasmart not only 
provides a listing of Web pages in response to a query, but also finds 
material on related topics, such as Hotels... 

...World. One of those was InMagic, Inc., who announced shipment of DB/Text 
Webserver, the Internet and Intranet site management version of the 
venerable and versatile InMagic database/textbase software. A Word Wheel 
feature. . . 

...words in a DB/Text Webserver database. 

Traditional online searchers may recognize this as an inverted 
file , ..ROOTing, or Expanding, but the word wheel terminology puts a nice 
spin on the old ... Internet or Intranet." Possible implementations of 
Messenger include clickable news headlines, headlines hotlinked to 
corresponding Web pages , advertising and "billboards" that register 
clicks to indicate interest for initiating sales processes; and priority 
messages" that deliver highly urgent communications and links to Web 
pages for more information or action. 

The Internet Company also produces NewsSpace, a search and crawl tool 
for Web sites and provides consulting for Internet-related design, 
development, programming, and marketing. Its client list includes... 

...Hewlett-Packard Company and Live Picture, Inc. jointly announced the 
arrival of an Imaging for Internet Web site , @ a public beta site 
demonstrating technology for viewing, sharing, and printing high-resolution 
images from the Internet . The site (http://www. image.hp.com) allows 
the display of photo-rich content and enables users to download the imaging 
for Internet technology. Currently, the Web page includes content 
samples from Corbis Corporation Photo Collection (selections from award 
winning photographers, museums, and. . . 

...1997 issue, NC World grows to full size as a stand-alone electronic 
publication, an advertiser -supported monthly magazine with mid-month 
updates. Access is free. Subscribers who fill out a detailed demographic 
form will receive e - mail alerts of new articles. (http://www. 



ncworldmag.com) . 

NetCarta Corporation 

Netcarta introduced version 2.0 of its WebMapper content management 
software, which creates an object-based, high-level map of Web sites 
for WebMasters. NetCarta WebMapper 2.0 automates most common site 
management tasks required in the maintenance of Internet and Intranet 
sites . 

WebMapper creates a Tree View, which provides an orderly and 
easy-to-read hierarchical outline... 

...new "Cyberbolic" view. This Web-like gestalt view shows site objects 
fanning out from the home page , then the next level of pages, and so 
on, effectively giving a bird's eye... 

...average amount of time it takes end users to access a selection of 
high-traffic Web sites . 

ZDNet. Sweep should debut in the first quarter of 1997, published in 
three places: the ZD Net Web site (http:// www.zdnet.com), where it 
will update several times each day, Interactive Week. . . 
...compare their daily access and retrieval times to the average 
performance of selected high-traffic Web sites . 

The net. Sweep polling system is based on a distributed network across 
the United States... 

...are established in major cities and through multiple Internet Service 
Providers (ISPs) . The system accesses Web sites in the ZDNet. Sweep 
index over 500 times per day, seven days a week, resulting ... introduced 
Netscape Communicator, a comprehensive set of Internet and Intranet 
component software that integrates open e - mail , groupware, editing, 
calendering, and browsing tools t'o allow users to easily communicate, 
share, and access... 

...Navigator (the popular Web browser), Netscape Messenger @allowing the 
composition, sending and receiving of encrypted e - mail , using 
open-standard based mail), Netscape Collabra (facilitating collaboration 
with co-workers and leveraging corporate... 

...team; posted notes can have an expiration date and urgency markings to 
denote importance) ; and E - mail Notification (users can subscribe to get 
notices via e - mail when something significant, such as an object, 
message, or task, changes in the project.). 
Paperclip. . . 

...Microsoft Internet Explorer 3.0, and Spyglass Mosaic 2.10 or above. 
WebClip automatically checks Web pages . identifies whether any changes 
have occurred on a page, and sends the new content to a user's PC. Users 
can then display the same Web pages at hard drive speed, with all 
graphics and hotlinks intact. Users can schedule unattended monitoring... 

...to maintain collections of Web content even when the content no longer 
resides on the Web site itself. 

PaperClip has expanded its retail distribution to individuals and 
enterprises via numerous chains and. . . 

...spiders. ATI also allows an users to create intelligent agents to the 
search newsgroups and Web sites , with e - mail notification of 
results . 

In a statement that will ring true to experienced searchers, PLS 
founder . . . 

...Virtual Address Book if you lose yours. Best of all. you can access 
PlanetAll by Web . e - mail . touch-tone phone, or fax. Also offered is a 
Special Occasions Reminder, which alerts members ... Spectrum and Find/SVP) , 
QUALCOMM's Eudora can rightfully claim to be the leader in Internet e - 
mail software. At Internet World, QUALCOMM announced the availability of 
Eudora Light 3.0.1 for... 



.download at http://www.eudora.com. 



Eudora Light is known as an easy-to-use e - mail solution for new 
Internet users. Many ISPs distribute it for free. Major new features of 
Eudora Light 3.0.1. include basic filters to organize e - mail 
automatically, an enhanced Find dialog, mail server interactive control, 
drag and drop capabilities for file. . . 

...features knot in Eudora Light 3.0.1 freeware) include a user 
controllable toolbar, multiple e - mail accounts., enhanced filters, 
including auto-reply and forwarding, spell checking., composition of 
stylized text, expanded. . . 

...features to its travel-related services. Launched a year ago, 
Travelocity has become a popular Web site for do-it-yourself travel 
arrangers. Users can choose from among 50 car rental companies ... the speed 
and cost-effectiveness of electronic delivery. The Posta user enters a 
recipient's e - mail address (or a -mailing list) and selects a document 
to send. Tumbleweed Posta places the document on a powerful server, which 
sends an e - mail notification to the recipient. When recipients receive 
notification of a document's arrival through their e - mail client 
software, they simply click on an address to receive the document. 
Tumbleweed Posta users. . . 

...deliver an appropriate viewer or plug-in with the document, when needed. 
Anyone with an Internet e - mail account can receive Posta documents, 
regardless of their recipient e - mail system or hardware and software 
capabilities. Tumbleweed Posta users can track documents through every step 



...a photographic memory of Web sessions. While you surf, ZooWorks records 
and indexes all the Web pages and documents you visit. This 
information, along with the content of the HTML document, is... for 
searchers. Intranet: Products that assist in the development or maintenance 
of an Intranet or Internet Web site . 

How to Talk to Internet World Exhibitors: 

* Tell them "I'm happy to pay for... 
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simply standardize the multiple formats encountered when deriving 
documents from various providers or handling various Web sites . The 
steps serve to merge all the data into a single consistent data structure 
that... 

...later steps of document processing. Step two is important because the 
pointers stored in the inverted file will enable a system to retrieve 
various sized units — either site, page, document, section, paragraph... 
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51 6843167 ADVERTIS? OR ADVERTIZ? OR AD OR ADS OR PROMOTION? OR ADVER- 

T? ? OR COMMERCIAL () MESSAGE? 

52 1800 (INVERT?) (2N) (INDEX? ? OR INDICES OR FILE OR FILES OR FI- 

LING? OR LIST OR LISTS OR LISTING? OR STRUCTURE?) 

53 7309091 (INTERNET OR WEB OR ONLINE OR ON () LINE OR HOME) (2N) (PAGE OR 

PAGES OR SITE OR SITES OR PORTAL? OR DIRECTOR?) 

54 4079577 (E OR ELECTRONIC OR DIGITAL OR VIRTUAL) (1W) (MAIL??? OR M- 



ESSAG??? OR CORRESPOND?) OR EMAIL???? OR (INTERNET OR ON () LI- 
NE OR ONLINE OR WEB) (1W)' MAIL???? OR MIME OR SMTP OR POP (IN) - 
MAIL 
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...ABSTRACT: be considered that would protect everyone. Online commercial 
services and searchable database archives on publisher Web sites should 
continue to maintain the inverted file index terms and tags that 
identify material barred from full-text delivery by the Tasini decision. 
The inverted file indexes belong to the host services, and they could 
continue to identify relevant articles, at... 

...TEXT: access to all New York Times book reviews on its own http://www 
.nytimes.com Web site . 

The decision clearly gave freelance authors electronic reproduction rights, 
providing they didn't have written. . . 

... Of course, then the National Writers Union, plaintiff in the Tasini 
case, took out an ad urging writers to not take the Times up on that 
offer unless they got compensated. . . 

...across the Web picking up information as they go? If The New York Times' 
own Web site has lost all its book reviews, then why not use the ones 
you find on... of the material. 

Another Solution 

Specifically, online commercial services and searchable database archives 
on pubfisher Web sites should continue to maintain the inverted file 
index terms and tags that identify material barred from full-text 
delivery by the Tasini decision. The inverted file indexes belong to 
the host services, regardless of the fact that all the terms were generated 
from text produced by authors, freelance or otherwise. If inverted file 
indexes remain complete and comprehensive, they could continue to identify 
relevant articles, at least by. . . 

. . . Once the linear file is created, search engine software processes the 
text to generate an inverted file index . In the case of full-text 
databases, that usually means taking every word in the. . . 

. . . back to the full linear file document record. When users search, they 
only use the inverted file index until they create a set of search 
results. When they display all or part of... 

...back to the original linear file to gather the documents. 

In this proposal, the underlying inverted file index upon which the 
searching process rests would continue to retain all the index terms 
generated. . . 

... articles. But since no one could ever re-create a whole document from 
using the inverted file indexing, that indexing constitutes a new 

creation and one copyrighted to the online service. In... 

. . . search services, usually that just means that someone has shut down the 
links between the inverted file index and the linear file containing 
the documents as documents. In most cases, services will only... 

... complete as possible. We all know that no full-text archive is really 
complete-no ads , usually no graphics, usually no letters to the editor, 



often no short news items, sometimes ... comprehensive after all-for once. 

Here and now, I promise all commercial hosts and publisher Web sites 
that I, for one, will beat my drums as loud as I can beat them. . . 

... new ballgame. I call on searchers everywhere who agree with this 
approach to send me e - mail messages (bquint@mindspring.com) of 
support. I promise to forward them on to the relevant executives... 

. . . editor in chief of Searcher, contributing editor for NewsBreaks, and a 
longtime online searcher. Her e - mail address is bquintQmind spring.com. 
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...TEXT: engine uses patentpending PageR.ank technology to perform an 
objective measurement of the importance of Web pages . This measurement 
is calculated by solving an equation of 500 million variables and more than 



...to commercial sites, and Google SiteSearch, which adds search capability 
to information within a specific site . ONLINE publisher Jeff Pemberton 
talked with Larry Page, now Google's CEO, to find out more... 

...is proportional to the number of choices you have. So if you go to a 
Web page with a lot of links, it actually takes longer to do a search 

there than. . . 

...than if you saw just a static summary. 
PEMBERTON: You say you store the entire Web ? 

PAGE : Yes, we store the entire Web that we've indexed. 

PEMBERTON When a query comes... 

...words occurred. That part of it is a pretty typical system. 
PEMBERTON: It's an inverted file ? 

PAGE: Yes, an inverted index or an inverted file . We still use that 
technology. It's good technology. It's why search engines work... 

...the Web and find things. Of course, that's not very practical. 

We have an inverted index , and we do Boolean AND queries. All the terms 
you use in your query have... 

. . . good. 

Another part of our technology is that we're looking at not only the Web 

page , but we're looking at the pages around it in the hypertext. So if 
someone . . . 

... a malicious person wants people to think he's Stanford, so he takes 
Stanford 1 s home page and copies it onto his own server. Just looking at 
the text of the page. . . 

...plenty of clues sprinkled around. One is that lots of people link to the 
Stanford home page , and the people who link to the Stanford page also 



tend to have a lot... 

... not just random pages. There's plenty of quality pages that link to the 
Stanford home page , and that's significant, too. So when we run a 

search, we look at all... 

...It's stored in our index. 

PEMBERTON: So that's existing in parallel with the inverted index ? 

PAGE: Yes, it's part of the inverted index . As a first approximation, 
that kind of information is stored in our index, so when. . . 

... thousand links out there that point to Stanford, and it's mentioned in 
context. The home page is mentioned in all those places. It's a pretty 
good measure that it 's...s say, Yahoo! decided ONLINE is really great, so 
they create a link on their home page . Yahoo! gets half abillion or so 
page views per day. That would be very significant... 

...thinking about it is surfing the Web at random. If I gave you a random 
Web page and you clicked on a link at random, you would get another 

page .. Then you . . . 

...link at random. Just keep doing that-lick, click, click. Then, after you 
visit every page on the Web , I'll count up how often you've visited 
each one. It turns out that... 

...around the Web. It doesn't take into account people searching or hearing 
stuff [promoting Web sites ] on the radio or whatever, but as people see 
that stuff, it gets reflected in same site. Will you show only the home 
page ? 

PAGE : No, we'll probably return whatever ranks highest. 

PEMBERTON: How has your company grown since. . . is sort of a de facto 
Internet standard. You just create a file on your Web site . It has a 
specific name, and there's a way of describing certain documents that... 

...of nice for consumers. And Yahoo!, for example, just had a great quarter 
based on advertising revenue. It's a real business making a lot of money. 
That seems to be... 

...a business standpoint, it has a lot of benefits. You don't have to have 
advertising . You're paying content creators for whatever they've created, 
and that's really a... 

... Yes, absolutely. The example I'll give you is RedHat . They say, "I have 
a Web site . I want search over it, but I don't really know how to do it 
. . .we can keep innovating. 

Jeff Pemberton (jeffp@onlineinc.com) is Publisher of ONLINE magazine. 
Comments? Email letters to the Editor to editor@onlineinc.com. 
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user and how these design decisions can be influenced by designers, 



writers, and authors of Web pages . 

Selection is a slippery slope. Whether we choose materials by quality 
of data, authority, or... have chosen. Others want to keep you on their site 
to get additional revenue from advertisements that pop up every time you 
search or display a page. 

Correcting False Assumptions About... 

...of terms to search against. This is created by gathering the text of 
millions of Web pages and then creating an alphabetical list of words 
with ties to their locations in each Web page indexed, an inverted 
index . It is the inverted index that users search when they use a Web 
search engine, not the original pages. In... 

...then follows links from those pages to add others. But there are over a 
billion pages on the Web , and to make matters worse, those pages appear 
and disappear with no warning. The sheer quantity of Web pages makes 
building an effective Web search engine a Sysiphean endeavor. 
To add to the confusion. . . 

. . .with deep rich archives of information go online, this is an 
increasingly common design for Web sites : Pages are stored in a 
database as plain text and dynamically turned into HTML only after... 

...are still millions, perhaps billions, of pages. In order to keep up with 
the newest pages , Web search engines often take submissions from owners 
or designers of pages so as to know. . . 

...list of sites that are crawled regularly. Even so, only a small 
percentage of the Web pages are crawled. It is not feasible to crawl 
them all. Here again the question of... 

...often? A newspaper site, crawled once a week would miss 6 days of news. 
Normal Web sites , on the other hand, may change at most once a month or 
even once every. . .AND with some latitude for finding additional good 
matches . 

Some search engines index the whole Web page , while others only 
index the first 500-1,000 words. Some use the number of... 

...fuzzy AND. Full-text indexing may be added or subtracted. And, of 
course, many more Web pages will be crawled, URLs dropped, and dynamic 
databases added. Ranking algorithms will be subtly altered. . . 

...long for them to go to each page and search it separately Consequently, 
if a Web page has been moved or removed, the fact of its nonexistence 
may not be discovered for. . . 

...less than technically proficient business to sue its previous Internet 
Service Provider because its previous Web page kept being retrieved 
after it had already moved to another ISP. The business owners were... 

...hand, yield direct answers to specific questions. In the case of Yahoo!, 
the best-known directory on the Web , experts select sites on a 
subject and organize the listings according to a subject scheme that groups 
together . . . 

...as being good navigation tools. However, directories like Yahoo! cannot 
possibly contain the number of sites that the Web does, nor. would one 
want them to. The strength of a directory lies in its ... academic and 
research sites, while others are more product-oriented. 

* Factors that increase revenues from advertising or other sources 
also improve the quality of the search results. Goto.com was the... 

...the "serious" players in the field, like using the yellow pages and 
scanning for boxed ads or at least ones in boldface. 

However, business relationships or advertisements can more subtly 
skew the information offered. While other search engines besides Goto.com 
claim. . . 

...adjoining column on the results page. This may provide links to other 



related sites, to advertising partners who sell related products, or to 
collections about the topic being searched. For instance ... have been giving 
higher rankings to sites that pay for being returned. Similarly, spamdexing 
by Web sites in order to obtain high rankings or inappropriate listings 
has also angered searchers. However, the... 

...a high weight in search. 

* How many links there are to a page from other pages on the Web . 

* Other unpredictable factors like popularity (Direct Hit), whether 
the site has paid for its ranking. . . 

... in trouble . 

* The whole Web is crawled. 

* All search engines search the same set of Web pages . 

* Search engines search the actual page at the time of searching. 

* All search engines work. . . 

...to deal with it: the author of either the document or the metadata or 
the Web page design. 

Web search engines dance a wary minuet ...unwitting user may see. 
Sometimes this is useful -- since Web crawlers index only text while Web 
sites may want to present words as graphics, e.g., corporate logos, the 
search engine page... 

...are better matches. 

* Goto.com sells high rankings to the highest bidder. 

* Finding company's home page is sometimes difficult. Many search 
engines have tried to resolve such searches ahead of time... 

...MSN (Microsoft Network), use RealNames for this purpose. RealNames 
promises to select the one right home page site for 
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...ABSTRACT: perform the same basic function: starting with a query, they 
point to the best-matching Web pages . To speed up the matching process 
search engines continuously sift throug millions of pages on the Web , 
creating a huge central index of words and phrases against which the query 
is compared. To build their indexes, search engines typically use crawler. 
Most engines build an inverted index , in which each word points 

directly to all the documents it occurs in. 
? 
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Detailed Description 

the searches of block 400 and block 402 is a listing of rows of the 
inverted index or indexes containing the searched information. Each 
row 

contains the information associated with a search listing of the pay for 
performance database along with all the text of the web page 
associated with the search listing. In the illustrated embodiment, the 
search listing includes the advertiser f s search terms, the URL of the 
web page , a tifie and descriptive text. 

At block 404, the returned related search results are sorted... set canon 
cnt $cnt where 

cannon-search-text = $cst" > 

</SQL> 

</SQL> 



</SQL> 
</a> 

</script> 

Aggregates web page body-text and 
listings based on the related-search 
result, while collecting and creating 
derived-data of 1, how many different 

advertisers have web - pages associated 
with the related-search result. 

<script language=vortex> 
<timeout = -lx/timeout> 
<DB = /home/goto. . . 
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Detailed Description 

Claims 

Fulltext Word Count: 23087 
English Abstract 

One embodiment of the present invention provides a system characterizes a 
document with respect to clusters of conceptually related words. Upon 
receiving a document containing a set of words, the system selects 
"candidate clusters" of conceptually related words that are related to ■ 
the set of words (Figure 22, 2202) . These candidate clusters are selected 
using a model that explains how sets of words are generated from clusters 
of conceptually related words (Figure 22, 2204). Next, the system 
constructs a set of components to characterize the document, wherein the 
set of components includes components for candidate clusters (Figure 22, 
2206) . Each component in the set of components indicates a degree to 
which a corresponding candidate cluster is related to the set of words 
(Figure 22, 2208) . 

French Abstract 

Un mode de realisation de la presente invention concerne un systeme 
destine a caracteriser un document par rapport a des groupes de mots 
associes conceptuellement . Lors de la reception d f un document contenant 
un ensemble de mots, le systeme selectionne des "groupes candidats" de 
mots associes conceptuellement associes a un ensemble de mots. Ces • 
groupes candidats sont selectionnes au moyen d'un modele expliquant 
comment les ensembles de mots sont generes a partir de groupes de mots 
associes conceptuellement. Ensuite, le systeme construit un ensemble de 
composants en vue de caracteriser le document, cet ensemble de composants 
comprenant des composants destines a des groupes candidats. Chaque 
composant dans 1' ensemble de composants indique un degre selon lequel un 
groupe candidat correspondant est associe a 1' ensemble de mots. 
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Detailed Description 

Claims 

Fulltext Word Count: 11194 
English Abstract 

Advertisers are permitted to put targeted ads on e-mails. The present 
invention may do so by (i) obtaining information of an e-mail that 
includes available spots for ads, (ii) determining one or more ads 
relevant to the e-mail information, and/or (iii) providing the one or 
more ads for rendering in association with the e-mail. 

French Abstract 

Les annonceurs peuvent inserer des annonces publicitaires ciblees dans 
des messages electroniques . Pour ce faire, la presente invention consi^ste 
(i) a obtenir des informations d'un message electronique comprenant des 
espaces disponibles pour des annonces publicitaires, (ii) a determiner 
une ou plusieurs annonces publicitaires pertinentes selon les 
informations de message electronique, et/ou (iii) a fournir ces annonces 
publicitaires en vue d'une restitution en association avec ce message 
electronique . 
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Detailed Description 

Claims 

Fulltext Word Count: 7243 
English Abstract 

Targeting information (also referred to as ad "serving constraints") or 
candidate targeting information for an advertisement is identified (1). 
Targeting information may be identified (410) by extracting topics or 
concepts (420) from, and/or generating topics or concepts based on, ad 
information, such as information from a Web page to which an ad is linked 
(or some other Web page of interest to the ad or advertiser). (400). The 
topics or concepts may be relevant queries associated with the Web page 
of interest, clusters, etc. 

French Abstract 

Selon 1' invention, des informations de ciblage (egalement appelees 
"contraintes de service" d'annonces publicitaires ) et des informations de 
ciblage candidates pour une annonce publicitaire sont identifiees. Ces 
informations de ciblage peuvent etre identifiees par extraction de sujets 
ou de concepts a partir d ' informations d'annonces publicitaires, et/ou 
par generation de sujets ou de concepts sur la base d' informations 
d'annonces publicitaires, telles que des informations issues d'une page 
Web a laquelle une annonce est liee (ou une autre page Web d'interet pour 
l f annonce ou 1 ' annonceur ) . Ces sujets ou ces concepts peuvent etre des 
demandes pertinentes associees a la page Web d'interet, des groupes, etc. 
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Detailed Description 

Claims 
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English Abstract 

Advertisers (110) are permitted to put targeted ads on page on the web 
(or some other document of any media type) (Figure 1) . The present 

invention may do so by (i) obtaining content that includes available 

spots for ads (120), (ii) determining ads relevant to content, and/or 
(iii) combining content with ads determined to be relevant to the 

content . 

French Abstract 

Des annonceurs peuvent mettre des annonces ciblees sur une page web (ou 
sur d'autres documents de n'importe quel type de moyen de communication). 
La presente invention concerne un procede consistant : 1) a obtenir un 
contenu comprenant des messages publicitaires disponibles pour des 
annonces, 2) a determiner quelles sont les annonces pertinentes par 
rapport au contenu, et/ou 3) a combiner le contenu avec des annonces 
considerees pertinentes par rapport au contenu. 
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Detailed Description 

Claims 

Fulltext Word Count: 6511 
English Abstract 

The relevance of advertisements to a user's interests is improved (Figure 
4). In one implementation, the content of a web page is analyzed to 
determine a list of one car more topics associated with that web page 
(420) . An advertisement is considered to be relevant (440) to that web 
page if it is associated with keywords belonging to the list of one or 
more topics. One or more of these relevant advertisements may be provided 
for rendering in conjunction with the web page or related web pages. 

French Abstract 

Selon 1' invention, la pertinence des annonces publicitaires par rapport 
aux interets de 1 ' utilisateur est amelioree. Dans un mode de realisation, 
le contenu d'une page web est analyse afin de determiner une liste d'un 
ou de plusieurs sujets associes a cette page web. Une annonce 
publicitaire est consideree pertinente pour cette page web si elle est 
associee a des mots cles appartenant a la liste d'un ou de plusieurs 
sujets. Une ou plusieurs de ces annonces peuvent etre fournies avec 
ladite page web ou les pages web associees pour la presentation. 
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